AITopics | Programming Languages

Collaborating Authors

Programming Languages

News Overviews Instructional Materials AI-Alerts Classics

Dense Associative Memory with Energy

Neural Information Processing SystemsJun-21-2026, 15:18:19 GMT

We propose a novel energy function for Dense Associative Memory (DenseAM) networks, the log-sum-ReLU (LSR), inspired by optimal kernel density estimation. Unlike the common log-sum-exponential (LSE) function, LSR is based on the Epanechnikov kernel and enables exact memory retrieval with exponential capacity without requiring exponential separation functions. Moreover, it introduces abundant additional emergent local minima while preserving perfect pattern recovery -- a characteristic previously unseen in DenseAM literature. Empirical results show that LSR energy has significantly more local minima (memories) that have comparable log-likelihood to LSE-based models. Analysis of LSR's emergent memories on image datasets reveals a degree of creativity and novelty, hinting at this method's potential for both large-scale memory storage and generative tasks.

artificial intelligence, guideline, justification, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.61)

Add feedback

Sharp Capacity Thresholds in Linear Associative Memory: From Winner-Take-All to Listwise Retrieval

Barnfield, Nicholas, Kim, Juno, Nichani, Eshaan, Lee, Jason D., Lu, Yue M.

arXiv.org Machine LearningMay-7-2026

How many key-value associations can a $d\times d$ linear memory store? We show that the answer depends not only on the $d^2$ degrees of freedom in the memory matrix, but also on the retrieval criterion. In an isotropic Gaussian model for the stored pairs, we show that top-1 retrieval, where every signal must beat its largest distractor, requires the logarithmic model-size scale $d^2\asymp n\log n$. We prove that the correlation matrix memory construction, which stores associations by superposing key-target outer products, achieves this scale through a sharp phase transition, and that the same scaling is necessary for any linear memory. Thus the logarithm is the intrinsic extreme-value price of winner-take-all decoding. We next consider listwise retrieval, where the correct target need not be the unique top-scoring item but should remain among the strongest candidates. To formalize this regime, we propose the Tail-Average Margin (TAM), a convex upper-tail criterion that certifies inclusion of the correct target in a controlled candidate list. Under this listwise retrieval criterion, the capacity follows the quadratic scale $d^2\asymp n$. At load $n/d^2\toα$, we develop an exact asymptotic theory for the TAM empirical-risk minimizer through a two-parameter scalar variational principle. The theory has a rich phenomenology: in the ridgeless limit it yields a closed-form critical load separating satisfiable and unsatisfiable phases, and it predicts the limiting laws of true scores, competitor scores, margins, and percentile profiles. Finally, a small-tail extrapolation further leads to the conjectural sharp top-1 threshold $d^2\sim 2n\log n$.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2605.05189

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language (0.67)
(3 more...)

Add feedback

Sinkhorn Based Associative Memory Retrieval Using Spherical Hellinger Kantorovich Dynamics

Mustafi, Aratrika, Mukherjee, Soumya

arXiv.org Machine LearningMar-24-2026

We propose a dense associative memory for empirical measures (weighted point clouds). Stored patterns and queries are finitely supported probability measures, and retrieval is defined by minimizing a Hopfield-style log-sum-exp energy built from the debiased Sinkhorn divergence. We derive retrieval dynamics as a spherical Hellinger Kantorovich (SHK) gradient flow, which updates both support locations and weights. Discretizing the flow yields a deterministic algorithm that uses Sinkhorn potentials to compute barycentric transport steps and a multiplicative simplex reweighting. Under local separation and PL-type conditions we prove basin invariance, geometric convergence to a local minimizer, and a bound showing the minimizer remains close to the corresponding stored pattern. Under a random pattern model, we further show that these Sinkhorn basins are disjoint with high probability, implying exponential capacity in the ambient dimension. Experiments on synthetic Gaussian point-cloud memories demonstrate robust recovery from perturbed queries versus a Euclidean Hopfield-type baseline.

amin, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

2603.20656

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.61)

Add feedback

Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes Jerry Y ao-Chieh Hu Dennis Wu

Neural Information Processing SystemsFeb-16-2026, 05:44:26 GMT

We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > Austria > Vienna (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(4 more...)

Add feedback

QueryPose: SparseMulti-PersonPoseRegressionvia Spatial-AwarePart-LevelQuery

Neural Information Processing SystemsFeb-8-2026, 22:46:50 GMT

Thetwoindependent modelsleadtothenon-end-to-end pipeline, or called two-stage pipeline. Moreover, the human detector involves extra memory as well as computational cost. The bottom-up strategy [16, 17, 18, 19] uses the keypoint heatmap to locate all person keypoints at first and then assigns them to individuals via heuristic grouping process,asshowninFigure1(a).

artificial intelligence, machine learning, part-level query, (16 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.34)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.34)

Add feedback

Temporal Complexity and Self-Organization in an Exponential Dense Associative Memory Model

Cafiso, Marco, Paradisi, Paolo

arXiv.org Machine LearningJan-19-2026

Dense Associative Memory (DAM) models generalize the classical Hopfield model by incorporating n-body or exponential interactions that greatly enhance storage capacity. While the criticality of DAM models has been largely investigated, mainly within a statistical equilibrium picture, little attention has been devoted to the temporal self-organizing behavior induced by learning. In this work, we investigate the behavior of a stochastic exponential DAM (SEDAM) model through the lens of Temporal Complexity (TC), a framework that characterizes complex systems by intermittent transition events between order and disorder and by scale-free temporal statistics. Transition events associated with birth-death of neural avalanche structures are exploited for the TC analyses and compared with analogous transition events based on coincidence structures. We systematically explore how TC indicators depend on control parameters, i.e., noise intensity and memory load. Our results reveal that the SEDAM model exhibits regimes of complex intermittency characterized by nontrivial temporal correlations and scale-free behavior, indicating the spontaneous emergence of self-organizing dynamics. These regimes emerge in small intervals of noise intensity values, which, in agreement with the extended criticality concept, never shrink to a single critical point. Further, the noise intensity range needed to reach the critical region, where self-organizing behavior emerges, slightly decreases as the memory load increases. This study highlights the relevance of TC as a complementary framework for understanding learning and information processing in artificial and biological neural systems, revealing the link between the memory load and the self-organizing capacity of the network.

artificial intelligence, complexity, machine learning, (18 more...)

arXiv.org Machine Learning

2601.11478

Country:

North America > United States (0.28)
Europe > Italy (0.28)
Europe > Spain (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.61)

Add feedback

Dense Associative Memory Through the Lens of Random Features

Neural Information Processing SystemsDec-24-2025, 14:13:07 GMT

Dense Associative Memories are high storage capacity variants of the Hopfield networks that are capable of storing a large number of memory patterns in the weights of the network of a given size. Their common formulations typically require storing each pattern in a separate set of synaptic weights, which leads to the increase of the number of synaptic weights when new patterns are introduced. In this work we propose an alternative formulation of this class of models using random features, commonly used in kernel methods. In this formulation the number of network's parameters remains fixed. At the same time, new memories can be added to the network by modifying existing weights. We show that this novel network closely approximates the energy function and dynamics of conventional Dense Associative Memories and shares their desirable computational properties.

artificial intelligence, dense associative memory, machine learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.91)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.91)

Add feedback

CAMformer: Associative Memory is All You Need

Molom-Ochir, Tergel, Morris, Benjamin F., Horton, Mark, Wei, Chiyue, Guo, Cong, Taylor, Brady, Liu, Peter, Wang, Shan X., Fan, Deliang, Li, Hai Helen, Chen, Yiran

arXiv.org Artificial IntelligenceNov-26-2025

Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.1974

Country: North America > United States (0.46)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.71)
Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes Jerry Y ao-Chieh Hu Dennis Wu

Neural Information Processing SystemsOct-10-2025, 07:54:57 GMT

We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories.

arxiv preprint arxiv, memory capacity, spherical code, (13 more...)

Neural Information Processing Systems

Country:

South America > Brazil (0.04)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > Austria > Vienna (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(4 more...)

Add feedback

Muon Outperforms Adam in Tail-End Associative Memory Learning

Wang, Shuche, Zhang, Fengzhuo, Li, Jiaxiang, Du, Cunxiao, Du, Chao, Pang, Tianyu, Yang, Zhuoran, Hong, Mingyi, Tan, Vincent Y. F.

arXiv.org Artificial IntelligenceOct-7-2025

The Muon optimizer is consistently faster than Adam in training Large Language Models (LLMs), yet the mechanism underlying its success remains unclear. This paper demystifies this mechanism through the lens of associative memory. By ablating the transformer components optimized by Muon, we reveal that the associative memory parameters of LLMs, namely the Value and Output (VO) attention weights and Feed-Forward Networks (FFNs), are the primary contributors to Muon's superiority. Motivated by this associative memory view, we then explain Muon's superiority on real-world corpora, which are intrinsically heavy-tailed: a few classes (tail classes) appear far less frequently than others. The superiority is explained through two key properties: (i) its update rule consistently yields a more isotropic singular spectrum than Adam; and as a result, (ii) on heavy-tailed data, it optimizes tail classes more effectively than Adam. Beyond empirical evidence, we theoretically confirm these findings by analyzing a one-layer associative memory model under class-imbalanced data. We prove that Muon consistently achieves balanced learning across classes regardless of feature embeddings, whereas Adam can induce large disparities in learning errors depending on embedding properties. In summary, our empirical observations and theoretical analyses reveal Muon's core advantage: its update rule aligns with the outer-product structure of linear associative memories, enabling more balanced and effective learning of tail classes in heavy-tailed distributions than Adam.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.2603

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback